cooperative heterogeneous deep reinforcement learning
Cooperative Heterogeneous Deep Reinforcement Learning
Numerous deep reinforcement learning agents have been proposed, and each of them has its strengths and flaws. In this work, we present a Cooperative Heterogeneous Deep Reinforcement Learning (CHDRL) framework that can learn a policy by integrating the advantages of heterogeneous agents. Specifically, we propose a cooperative learning framework that classifies heterogeneous agents into two classes: global agents and local agents. Global agents are off-policy agents that can utilize experiences from the other agents. Local agents are either on-policy agents or population-based evolutionary algorithms (EAs) agents that can explore the local area effectively. We employ global agents, which are sample-efficient, to guide the learning of local agents so that local agents can benefit from the sample-efficient agents and simultaneously maintain their advantages, e.g., stability. Global agents also benefit from effective local searches. Experimental studies on a range of continuous control tasks from the Mujoco benchmark show that CHDRL achieves better performance compared with state-of-the-art baselines.
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > Canada (0.04)
- Europe > Sweden > Stockholm > Stockholm (0.04)
- Asia > Singapore (0.04)
Review for NeurIPS paper: Cooperative Heterogeneous Deep Reinforcement Learning
The exact mechanic of the policy transfer between different algorithm is not given. Given the content, I may assume that "transfer" means a simple copying of the parameters, but I remain unsure. When augmenting the experience buffer with other algorithm, it would be nice to clarify why it does (not) introduce any bias in the data. It seems that the different parts of the framework could be replaced by a different way of "tinkering" with a algorithm or its hyperparameters. E.g., the auxiliary on-policy algorithms are here mainly for exploration, but the exploration of the main off-policy algorithm itself can be easily controlled and I suspect it can, with the right setting, work as good as the given complicated framework. The global and local experience buffer seems more like a hack.
Review for NeurIPS paper: Cooperative Heterogeneous Deep Reinforcement Learning
Following the rebuttals, all four reviewers agreed that this paper should be accepted. While there are remaining questions around the hyperparameters (and performance relative to other methods), and computational cost, this is an interesting and novel line of work. The authors are encouraged to proofread the paper thoroughly and address the issues raised by the reviewers.
Cooperative Heterogeneous Deep Reinforcement Learning
Numerous deep reinforcement learning agents have been proposed, and each of them has its strengths and flaws. In this work, we present a Cooperative Heterogeneous Deep Reinforcement Learning (CHDRL) framework that can learn a policy by integrating the advantages of heterogeneous agents. Specifically, we propose a cooperative learning framework that classifies heterogeneous agents into two classes: global agents and local agents. Global agents are off-policy agents that can utilize experiences from the other agents. Local agents are either on-policy agents or population-based evolutionary algorithms (EAs) agents that can explore the local area effectively.
Cooperative Heterogeneous Deep Reinforcement Learning
Zheng, Han, Wei, Pengfei, Jiang, Jing, Long, Guodong, Lu, Qinghua, Zhang, Chengqi
Numerous deep reinforcement learning agents have been proposed, and each of them has its strengths and flaws. In this work, we present a Cooperative Heterogeneous Deep Reinforcement Learning (CHDRL) framework that can learn a policy by integrating the advantages of heterogeneous agents. Specifically, we propose a cooperative learning framework that classifies heterogeneous agents into two classes: global agents and local agents. Global agents are off-policy agents that can utilize experiences from the other agents. Local agents are either on-policy agents or population-based evolutionary algorithms (EAs) agents that can explore the local area effectively. We employ global agents, which are sample-efficient, to guide the learning of local agents so that local agents can benefit from sample-efficient agents and simultaneously maintain their advantages, e.g., stability. Global agents also benefit from effective local searches. Experimental studies on a range of continuous control tasks from the Mujoco benchmark show that CHDRL achieves better performance compared with state-of-the-art baselines.
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Europe > Sweden > Stockholm > Stockholm (0.04)
- Asia > Singapore (0.04)